Comparación de algoritmos para la predicción de niveles de glucosa en pacientes con diabetes Comparison of algorithms for the prediction of glucose levels in patients with diabetes

This work presents a comparison between two algorithms for the prediction of glucose levels in diabetic patients by using a univariate time series. The algorithms are applied to the history of fasting glucose levels to predict the five following values. The comparison is performed between 1) The Autoregressive Neural Networks (ARNN) and 2) The autoregressive integrated moving average (ARIMA) models. A total of 70 series are analyzed, and we show that the results obtained for the ARIMA model have error percentages higher than 25% of the predicted value to the Comparison of algorithms for the prediction of glucose levels in patients with diabetes No 26, Vol. 13 (2), 2021. ISSN 2007 – 0705, pp.: 1 – 19 2 expected value. In contrast, in 73% of the cases, the percentage error was less than 25% for the Autoregressive Neural Networks.


Introduction
Diabetes mellitus is a chronic degenerative disease that is characterized by high blood glucose levels. This disease occurs when the pancreas stops producing insulin, when it does not produce it in enough quantities, or when the organism cannot use the insulin properly. Lack of insulin produces high glucose levels in the blood. This phenomenon is known as hyperglycemia and can severely damage many of the body's systems, e.g., cardiovascular, and nervous, in the long term (Wilmot et al., 2012). Consequently, a group of metabolic diseases like cardiovascular diseases, neuropathy, nephropathy, retinopathy, and blindness might follow a diabetes diagnosis. By controlling the blood glucose levels, some of these diseases might be prevented or delayed (Harris et al., 1987). Diabetes is diagnosed by testing blood glucose levels (World Health Organization [WHO], 2016). If one or more of the following criteria are satisfied: 1) the fasting blood glucose level is larger or equal to 126mg/dl 2) blood glucose is present after two hours of ingesting 75g of glucose 3) the blood glucose taken at random is larger than 200mg/dl. Many diabetes patients periodically monitor their glucose levels, and they use insulin shots to compensate for the pancreas insulin production insufficiency. These patients might benefit from tools that help them decide when to apply insulin (Amaris et al., 2017). The use of a predictive algorithm might be beneficial in these cases, and if the historical glucose levels follow a pattern, then their future values might be anticipated. For example, in reference (Zhao et al., 2012), a prediction of glucose levels from continuous monitoring data is made using autoregressive models with exogenous inputs that establish the future glucose levels as a lineal combination of current and recent glucose levels. In that reference, an laten variable based technique is used to develop an empirical model for predicting the patient's glucose levels.
The glucose levels are known for their instability and nonlinearity. For example , Frandes et al. (2017) modeled the glucose dynamics using nonlinear chaotic properties by monitoring the  Ståhl and Johansson (2009) showed how to estimate quantitative predictive models to design optimal insulin levels for the patients. Three aspects were considered: 1) insulin, 2) glucose, 3) insulin-glucose interaction, and different black-box and gray-box models were developed and analyzed. The models' short-term predictors for the glucose levels were designed to achieve prediction within two hours.
The neural networks (NN), multi-rate regression, and autoregressive integrated moving average (ARIMA) models are the most used models to study the evolution and make predictions.
In Velásquez et al. (2008), nonlinear models are used to predict the monthly electricity demand.
Among these models, the multilayer perceptron, the autoregressive neural network (ARNN), and the ARIMA model were compared to predict the monthly electricity demand in Colombia by using only the demand's historical data. ARNN showed less percentage of error, while in (Amaris et al., 2017). Tang et. al. (1991) compared three different times series with different characteristics and the they concluded that for time series with long memory both ARIMA and NN performed similarly, while for short memory the NN appeared to be superior. In contrast, for prediction of the solar radiation, Reikard (2009) concluded that ARIMA was superior. In another study (Adamowski et al., 2012) compared several linear and nonlinear regression, ARIMA, NN and wavelet NN for urban water demand forecasting concluding that the wavelet NN was superior.
In this work, an analysis of the fasting glucose level is done to predict the following five values, comparing the ARNN and ARIMA models. The ARNN takes advantage of autoregressive (AR) models and multilayer perceptron (MLP) to capture glucose levels' complex dynamics. The ARIMA models are composed of three elements: autoregressive models (AR), an integrator (I), and the mobile averages (MA), which are useful to find longitudinal data adjustments. compared against the ARIMA model. The two-layer and ten-neurons ARRN showed that 73% of the signals obtained error percentages below 25%.

Method
The data used in this work was obtained from the Diabetes-Data database, composed of 70 patients' data providing information like dates, glucose level monitoring times, and insulin dosages, along with aliment consumption and exercise performed (Michael, 2017).
The ARIMA and ARNN models describe one or more variables over time. These models have been applied to predicting currency exchange rates, rainfall levels, and energy consumption.
The artificial neural networks allow emulating the processing of information that the brain performs and allow it to be approximated to any function (Velásquez et al., 2008). The ARRN combines an autoregressive linear model (AR) and multilayer perceptron (MLP) that contains a hidden layer. The ARNN is a model that allows using the advantages of the AR and MLP to capture complex dynamics (Velásquez et al., 2008;Velásquez et al., 2009). The architecture of an ARNN is shown in Fig. 1. The ARNN model has a dependent variable , that is obtained from applying a nonlinear function to previous values, for : (1) Where:

Input values
Where is the sigmoid adaptive function define as: (2) The model parameters are and for and which are estimated by minimizing the regularization error: where is a user-defined parameter (Breu et al., 2011).
Box developed statistical models for the time series (Box et al., 1994), where each observation value is modeled as a function of previous values (Amaris et al., 2017; Breu et al., 2011;Casdagli, 1989;Broz and Viego, 2014). These models are known as ARIMA and are composed of the following parts: 1) autoregressive (AR) 2) integrand (I) 3) moving average (MA), this in order to adjust the longitudinal data.
The ARIMA models predict the future values of time series based on historical behavior, without considering the underlying factors responsible for the variations of the dependent variable (Broz and Viego, 2014). The ARIMA workflow is shown in Fig. 2; the process starts by identifying the candidate model for the series to evaluate, following by an estimation, which refers to selecting the appropriate data. Next, a validation stage takes place, and the process ends with the prediction of future values. ARIMA models can be expressed as: The neural networks (NN) have been used for the prediction in time series. A common error is not to realize that there is not an accepted methodology by the scientific community, but a set of guidelines and critical steps that have been adapted from general heuristics, the researcher ability, and previous knowledge of the analyzed series (Velásquez et al., 2008;Zhang et al., 1998).

Results
A series of tests were performed based on the literature review. The models were applied to the 70 subjects in the available database in order to compare their performance. Each series has glucose level samples; 70% of the data was used for training, and 30% for the prediction validation. Each one of these series has a different behavior since each of the individuals has a different lifestyle. In  Each, the ARIMA and ARNN models were applied to the elements of the database. In the ARIMA model, the signals were used in weekly cycles that showed the best results. The quantity of data    The ARNN was applied to each of the available times series using three different configurations, in Fig. 6 and 7, the predicted values for each of the configurations used by the ARRN. The five- neurons configuration is marked in red, in green the ten-neurons configuration, and the fifteenneurons configuration was plotted in blue.  An evaluation of the results obtained using the two different prediction models was performed. As metrics, the absolute error (AE), mean squared error (MSE), and the root mean square error (RMSE) were used. Those results are presented in this section to predict the five subsequent values of the glucose levels.  (Ståhl, 2009), using time series with sampled data in intervals from 5 to 120 minutes, see for example Table 1 in (Hameed, 2020), or in other cases using continuous information (Pérez-Gandía, 2010). The data that we have available has samples of approximately 24 hours, however this is the data that is available to the DM patients since they typically measure their sugar before breakfast. The results obtained with the ARIMA model were not close enough to the sampled glucose values.
The prediction values were high. In particular, when comparing with the values obtained by the ARNN. Linear regression is applied between the expected value and the predicted value; a line at 45 degrees' angle will represent a high precision in the predictions, it is possible to observe the scatterplots that show the positive linear correlation between the sampled glucose levels and each of the model's predictions. In Fig. 8, the ten-neurons ARNN model is the model that approximates the most to a 45% degrees' straight line. It can also have observed that the data dispersion is less than in the other models; thus, this is the best model in our evaluation. It is also possible to infer from our data that the ARIMA model is not appropriate to predict glucose levels, or at least not when using univariate time series. second, third, fourth, and fifth predictions can be observed in Fig. 9, 10, 11, 12, and 13, respectively. The R-squared adjustment is a statistical tool to measure how well a model predicts the sampled data; in other words, it is a measure of the relation between the predicting and goal variable. The R-squared takes values between 0 and 1; if close to zero the regression does not explain the variance in the response. On the other hand, a number close to 1 explains well the variance in the observed value in the output. In Table 2 are listed the obtained values for the R-squared of each prediction.
In Fig. 9, it can be observed that the first prediction of the ARIMA model underperforms.
However, the ten-neurons ARNN model approaches better the expected value; this is evident when comparing their respective values of the coefficient of determination since the first prediction for the ARIMA has a value of 0.03489, which is close to 0, and the ten-neurons ARRN has a value of 0.8007 which approaches 1. In Fig. 10 it can be observed that the models follow the same trend. The R-Squared for the second prediction in the ARIMA model is 0.004939, while for the ten-neurons ARNN has a value of 0.658.  In the ARNN model with fifteen-neurons, predictions four and five are not reliable since their R-squared adjustment is very close to 0. In deciding to use this model to predict glucose levels, it is crucial to consider that the prediction would be sufficient for three values ahead.
However, the best model for predicting glucose levels is the ARNN model with ten-neurons. It is the model that its average absolute error by prediction and in general are the lowest. In terms of the R-squared adjustment, it is the model that finds the best relationship between the prediction and the target variable.

Conclusion
The performance of the ARIMA and ARNN model for the prediction of glucose levels was analyzed. The results show that ARNN can predict up to five values of glucose. In 73% of the cases, the error was below 25%. On the other hand, the ARIMA model shows that only 6% of the cases had an error below 25%. It is important to mention that a prediction will never be completely accurate since many variables related to each patient's behavior are not considered and cannot be controlled. Despite that, we have established that ARNN is a viable option based on the relative and absolute errors for prediction and as a whole for glucose prediction. The ARNN was also the model that obtained the best R-squared adjustment to the predicted and sampled values. As future work, we would like to include categorical data into our database to classify the patients according to meat consumption, physical activity, insulin dosage, and sampling time.